Take-home Exercise 3

In this Take-home Exercise, I will explore the economic of the city of Engagement, Ohio USA.

Che Xuan https://www.linkedin.com/in/jacob-che-xuan-b646a9123/
2022-05-15

Task

Challenge 3: Economic considers the financial health of the city. Over time, are businesses growing or shrinking? How are people changing jobs? Are standards of living improving or declining over time?

Consider the financial status of Engagement’s businesses and residents, and use visual analytic techniques to address these questions.

Overview

In this take-home exercise, appropriate static and interactive statistical graphics methods are used to reveal the economic of the city of Engagement, Ohio USA while addressing the questions stated in the Task section.

The data are processed by using appropriate tidyverse family of packages and the statistical graphics are prepared using ggplot2 and its extensions.

Sketch of Proposed Design

The picture below shows a sketch of the initial design proposed.

Installing & Launching R Packages

Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The chunk code below will do the trick.

packages = c('tidyverse', 'ggdist', 'ggridges', 'patchwork', 'ggthemes', 'lubridate', 'ggiraph', 'gganimate', 'plotly', 'DT', 'crosstalk')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

library(trelliscopejs)

Importing Data

The code chunk below imports TravelJournal.csv and Employers.csv from the data folder into R by using read_csv() of readr package and save them as tibble data frames called travel and employers.

travel <- read_csv("rawdata/TravelJournal.csv")
employers <- read_csv("rawdata/Employers.csv")

summary(travel)
 participantId    travelStartTime               travelStartLocationId
 Min.   :   0.0   Min.   :2022-03-01 05:00:00   Min.   :   1         
 1st Qu.: 221.0   1st Qu.:2022-06-10 17:35:00   1st Qu.: 449         
 Median : 464.0   Median :2022-10-03 18:40:00   Median : 913         
 Mean   : 480.5   Mean   :2022-10-05 05:21:39   Mean   :1016         
 3rd Qu.: 726.0   3rd Qu.:2023-01-28 06:20:00   3rd Qu.:1358         
 Max.   :1010.0   Max.   :2023-05-24 23:35:00   Max.   :1805         
                                                NA's   :1043         
 travelEndTime                 travelEndLocationId   purpose         
 Min.   :2022-03-01 05:35:00   Min.   :   0        Length:2099656    
 1st Qu.:2022-06-10 18:10:00   1st Qu.: 449        Class :character  
 Median :2022-10-03 19:00:00   Median : 910        Mode  :character  
 Mean   :2022-10-05 05:46:07   Mean   :1015                          
 3rd Qu.:2023-01-28 06:45:00   3rd Qu.:1358                          
 Max.   :2023-05-24 23:55:00   Max.   :1805                          
                                                                     
  checkInTime                   checkOutTime                
 Min.   :2022-03-01 05:35:00   Min.   :2022-03-01 06:00:00  
 1st Qu.:2022-06-10 18:10:00   1st Qu.:2022-06-10 21:40:00  
 Median :2022-10-03 19:00:00   Median :2022-10-03 22:47:30  
 Mean   :2022-10-05 05:46:07   Mean   :2022-10-05 09:53:15  
 3rd Qu.:2023-01-28 06:45:00   3rd Qu.:2023-01-28 08:30:00  
 Max.   :2023-05-24 23:55:00   Max.   :2023-05-25 00:05:00  
                                                            
 startingBalance    endingBalance     
 Min.   :  -681.6   Min.   :  -640.7  
 1st Qu.:  5077.8   1st Qu.:  5086.4  
 Median : 12006.9   Median : 12019.5  
 Mean   : 19573.7   Mean   : 19590.8  
 3rd Qu.: 25972.4   3rd Qu.: 25992.5  
 Max.   :240494.7   Max.   :240838.8  
                                      
summary(employers)
   employerId     location           buildingId    
 Min.   : 379   Length:253         Min.   :   3.0  
 1st Qu.: 829   Class :character   1st Qu.: 261.0  
 Median :1279   Mode  :character   Median : 486.0  
 Mean   :1089                      Mean   : 517.8  
 3rd Qu.:1734                      3rd Qu.: 782.0  
 Max.   :1797                      Max.   :1041.0  

Data Wrangling

Some of the new time-related fields have been added to travel with the following code chunk:

travel$year_month <- format(as.Date(travel$`travelEndTime`), "%Y-%m")
travel$day <- day(travel$`travelEndTime`)
travel$wkday <- weekdays(travel$`travelEndTime`)

Data frames travel and employers are saved in RDS format to avoid uploading large files to Git.

saveRDS(travel, 'data/travel.rds')
saveRDS(employers, 'data/employers.rds')
travel <- readRDS('data/travel.rds')
employers <- readRDS('data/employers.rds')

Static Histogram for Overview

We will first examine the overall trend of participants’ travels for various purpose in travel.

ggplot(data=travel, 
       aes(x = travelStartTime, 
           fill = purpose)) +
    geom_histogram(bins=15,
                   color="black") +
    scale_y_continuous(NULL,
                     breaks = NULL) +
    labs(y= 'Travel Count', x= 'Time',
       title = "Travel Trend by Purpose Over Time") +
    theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
        axis.line= element_line(color= 'grey'))

Static Trellis Plot

We have transformed the data accordingly and displayed a trellis plot partitioned by purpose in order to look at travel trend for respective purpose.

qplot(year_month, count, data = travel_count) +
  facet_wrap(~ purpose) +
  labs(y= 'Travel Count', x= 'Time',
     title = "Travel Trend for Respective Purpose Over Time") +
  theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(), axis.text.x=element_blank(),
      axis.line= element_line(color= 'grey'))

Further Data Wrangling - Work

We will zoom into work travel patterns and both travel and employers are joined using inner_join() function of dplyr.

travel_to_work <- travel %>%
  filter(purpose == 'Work/Home Commute') %>%
  inner_join(y=employers, by = c("travelEndLocationId" = "employerId")) %>%
  select(participantId, travelEndTime, year_month, day, wkday, travelEndLocationId, purpose, location, buildingId) %>%
  rename('employerId' = 'travelEndLocationId')

Renaming has been performed simplicity using the code chunk below:

travel_to_work$purpose <- sub('Work/Home Commute', 
                              'Work',
                              travel_to_work$purpose)

Static Trellis Plot - Line

We will now look at daily trend of work travel patterns for each month.

ggplot() + 
  geom_line(data=travel_to_work_count,
            aes(x=day, 
                y=count, 
                group=year_month), 
            colour="black") +
  facet_grid(~year_month) +
  labs(y= 'Travel \nCount', x= 'Month/Day',
     title = "Daily Travel Trend across Month") +
  theme(axis.title.y= element_text(angle=0),
      axis.line= element_line(color= 'grey'))

Static Trellis Plot - Dot

Due to the cluttered-ness of the previous plot, we will aim to improve the display using a trellis dot plot.

qplot(day, count, data = travel_to_work_count) +
  facet_wrap(~ year_month) + 
  labs(y= 'Travel \nCount', x= 'Month/Day',
     title = "Daily Travel Trend across Month") +
  theme(axis.title.y= element_text(angle=0), axis.text.x=element_blank(), axis.ticks.x=element_blank(),
      axis.line= element_line(color= 'grey'))

Static Trellis Plot - Dot 2

Since the facet function of ggplot2 is not useful for visualizing large data, we will use trelliscopejs instead.

qplot(day, count, data = travel_to_work_count) +
  facet_trelliscope(~ year_month, nrow = 2, ncol = 4, width = 600,
                    path = "trellis/",
                    self_contained=TRUE) +
  labs(y= 'Travel \nCount', x= 'Day') +
  theme(axis.title.y= element_text(angle=0), axis.text.x=element_blank(), axis.ticks.x=element_blank(),
      axis.line= element_line(color= 'grey'))

Interactive Bar Graph

We will again transform the data accordingly to look at monthly work travel trend using geom_bar_interactive() of ggiraph.

travel_to_work_by_month$tooltip <- c(paste0(
  "Purpose = ", travel_to_work_by_month$purpose,
  "\n Count = ", travel_to_work_by_month$count))

p <- ggplot(data=travel_to_work_by_month, 
       aes(x = year_month, y = count)) +
  labs(y= 'Travel Count', x= 'Month',
       title = "Monthly Travel Trend Over Time") +
    theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
        axis.line= element_line(color= 'grey')) +
  geom_bar_interactive(aes(tooltip = travel_to_work_by_month$tooltip,
                           data_id = year_month),
                       stat="identity")

girafe(
  ggobj = p,
  width_svg = 12,
  height_svg = 12*0.618
)

Interactive Bar Graph - Coordinated Mutiple Views

We will transform the data to compute for month-over-month turnover rate and join with the previous plot to form coordinated mutiple views.

travel_to_work_monthly_change <- travel_to_work_daily_count %>%
  group_by(employerId, year_month) %>%
  summarise(monthly_employees = max(count)) %>%
  mutate(mom_change = coalesce(monthly_employees - lead(monthly_employees),0),
         mom_turnover_rate = coalesce((monthly_employees - lead(monthly_employees))/monthly_employees,0)) %>%
  ungroup()
travel_to_work_mom$tooltip <- c(paste0(
  "MOM Turnover % =", round(travel_to_work_mom$avg_turnover*100,1), '%'))

p2 <- ggplot(data=travel_to_work_mom, 
       aes(x = year_month, y = avg_turnover)) +
  labs(y= 'Turnover %', x= 'Month',
       title = "Monthly Turnover Trend Over Time") +
    theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
        axis.line= element_line(color= 'grey')) +
  geom_bar_interactive(aes(tooltip = travel_to_work_by_month$tooltip,
                           data_id = year_month),
                       stat="identity") +
  geom_bar_interactive(aes(tooltip = travel_to_work_mom$tooltip,
                           data_id = year_month),
                       stat="identity")

girafe(code = print(p / p2),
       width_svg = 12,
       height_svg = 12,
       options = list(
         opts_hover(css = "fill: #202020;"),
         opts_hover_inv(css = "opacity:0.2;")
         )
       )

Interactive Calendar Heatmap

We will now look at the calendar heatmap of travel patterns by wkday across month using geom_tile() as well as ggplotly.

p3 <- ggplot(travel_to_work_by_day, 
       aes(year_month, 
           wkday, 
           fill = daily_employees)) + 
  geom_tile(color = "white", 
          size = 0.1) + 
  theme_tufte(base_family = "Helvetica") + 
  coord_equal() +
  scale_fill_gradient(name = "# of travels",
                    low = "sky blue", 
                    high = "dark blue") +
  labs(y= NULL, x= 'Month',
       title = "Wkday-ly Travel Trend across Month") +
  theme(axis.ticks = element_blank(),
      axis.text.x = element_text(size = 7),
      plot.title = element_text(hjust = 0.5),
      legend.title = element_text(size = 8),
      legend.text = element_text(size = 6) )

ggplotly(p3)

Animated Bubble Plot

We will rename year_month values to 1-15 periods in order to plot an animated bubble plot.

p4 <- ggplot(travel_to_work_initial, aes(x = mom_turnover_rate, y = strength_level, 
                      size = monthly_employees, 
                      colour = employerId)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_size(range = c(2, 12)) +
  labs(title = 'Period cumulative by year_month: {frame_time}', 
       x = 'Turnover %', 
       y = 'Employee %') +
  transition_time(year_month) +
  ease_aes('linear')

  animate(p4, nframes = 100, fps = 3)